assessment model
Collaborative Management for Chronic Diseases and Depression: A Double Heterogeneity-based Multi-Task Learning Method
Chai, Yidong, Liu, Haoxin, Xie, Jiaheng, Wang, Chaopeng, Fang, Xiao
Wearable sensor technologies and deep learning are transforming healthcare management. Yet, most health sensing studies focus narrowly on physical chronic diseases. This overlooks the critical need for joint assessment of comorbid physical chronic diseases and depression, which is essential for collaborative chronic care. We conceptualize multi-disease assessment, including both physical diseases and depression, as a multi-task learning (MTL) problem, where each disease assessment is modeled as a task. This joint formulation leverages inter-disease relationships to improve accuracy, but it also introduces the challenge of double heterogeneity: chronic diseases differ in their manifestation (disease heterogeneity), and patients with the same disease show varied patterns (patient heterogeneity). To address these issues, we first adopt existing techniques and propose a base method. Given the limitations of the base method, we further propose an Advanced Double Heterogeneity-based Multi-Task Learning (ADH-MTL) method that improves the base method through three innovations: (1) group-level modeling to support new patient predictions, (2) a decomposition strategy to reduce model complexity, and (3) a Bayesian network that explicitly captures dependencies while balancing similarities and differences across model components. Empirical evaluations on real-world wearable sensor data demonstrate that ADH-MTL significantly outperforms existing baselines, and each of its innovations is shown to be effective. This study contributes to health information systems by offering a computational solution for integrated physical and mental healthcare and provides design principles for advancing collaborative chronic disease management across the pre-treatment, treatment, and post-treatment phases.
Collateral Damage Assessment Model for AI System Target Engagement in Military Operations
Maathuis, Clara, Cools, Kasper
Abstract--In an era where AI (Artificial Intelligence) systems play an increasing role in the battlefield, ensuring responsible targeting demands rigorous assessment of potential collateral effects. In this context, a novel collateral damage assessment model for target engagement of AI systems in military operations is introduced. Its layered structure captures the categories and architectural components of the AI systems to be engaged together with corresponding engaging vectors and contextual aspects. At the same time, spreading, severity, likelihood, and evaluation metrics are considered in order to provide a clear representation enhanced by transparent reasoning mechanisms. Further, the model is demonstrated and evaluated through instantiation which serves as a basis for further dedicated efforts that aim at building responsible and trustworthy intelligent systems for assessing the effects produced by engaging AI systems in military operations.
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
Rao, Jun, Lin, Zepeng, Liu, Xuebo, Ke, Xiaopeng, Lian, Lian, Jin, Dong, Cheng, Shengjun, Yu, Jun, Zhang, Min
Large Language Models (LLMs) often require domain-specific fine-tuning to address targeted tasks, which risks degrading their general capabilities. Maintaining a balance between domain-specific enhancements and general model utility is a key challenge. This paper proposes a novel approach named APT (Weakness Case Acquisition and Iterative Preference Training) to enhance domain-specific performance with self-generated dis-preferred weakness data (bad cases and similar cases). APT uniquely focuses on training the model using only those samples where errors occur, alongside a small, similar set of samples retrieved for this purpose. This targeted training minimizes interference with the model's existing knowledge base, effectively retaining generic capabilities. Experimental results on the LLama-2 and Mistral-V0.3 models across various benchmarks demonstrate that APT ensures no reduction in generic capacity and achieves superior performance on downstream tasks compared to various existing methods. This validates our method as an effective strategy for enhancing domain-specific capabilities without sacrificing the model's broader applicability.
From Motion Signals to Insights: A Unified Framework for Student Behavior Analysis and Feedback in Physical Education Classes
Gao, Xian, Ruan, Jiacheng, Gao, Jingsheng, Xie, Mingye, Zhang, Zongyun, Liu, Ting, Fu, Yuzhuo
Analyzing student behavior in educational scenarios is crucial for enhancing teaching quality and student engagement. Existing AI-based models often rely on classroom video footage to identify and analyze student behavior. While these video-based methods can partially capture and analyze student actions, they struggle to accurately track each student's actions in physical education classes, which take place in outdoor, open spaces with diverse activities, and are challenging to generalize to the specialized technical movements involved in these settings. Furthermore, current methods typically lack the ability to integrate specialized pedagogical knowledge, limiting their ability to provide in-depth insights into student behavior and offer feedback for optimizing instructional design. To address these limitations, we propose a unified end-to-end framework that leverages human activity recognition technologies based on motion signals, combined with advanced large language models, to conduct more detailed analyses and feedback of student behavior in physical education classes. Our framework begins with the teacher's instructional designs and the motion signals from students during physical education sessions, ultimately generating automated reports with teaching insights and suggestions for improving both learning and class instructions. This solution provides a motion signal-based approach for analyzing student behavior and optimizing instructional design tailored to physical education classes. Experimental results demonstrate that our framework can accurately identify student behaviors and produce meaningful pedagogical insights.
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Zezario, Ryandhimas E., Fu, Szu-Wei, Chen, Fei, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu
Abstract--In this study, we propose a cross-domain multiobjective 2.478 in unseen noise environments) over a CNN-based baseline speech assessment model called MOSA-Net, which SE model. Index Terms--non-intrusive speech assessment models, deep More specifically, MOSA-Net is designed to estimate the speech learning, multi-objective learning, speech enhancement. PEECH assessment metrics are indicators that quantitatively measure the specific attributes of speech signals. LCC by 0.021 (0.985 vs 0.964 in seen noise environments) For example, QIA-SE can improve PESQ by 0.301 Ryandhimas E. Zezario is with the Department of Computer Science and Fei Chen is with the Department of Electrical and Electronic Engineering, Southern University of Science and Technology of China, Shenzhen, China. Hsin-Min Wang is with the Institute of Information Science, Academia Sinica, Taipei, Taiwan. This testing strategy is prohibitive To attain a higher assessment accuracy, the MBNet adopts the and may not always be feasible. Hence, several objective BiasNet architecture to compensate for the biased scores of a evaluations metrics have been developed as surrogates for certain judge [49], In addition, the multi-task learning criterion human listening tests [6]-[31]. Meanwhile, different acoustic comprises two stages. The first stage includes a series of features are used as input to the assessment model to consider signal processing units designed to convert speech waveforms information from different acoustic domains [51], [52].
Smart Audit System Empowered by LLM
Yao, Xu, Wu, Xiaoxu, Li, Xi, Xu, Huan, Li, Chenlei, Huang, Ping, Li, Si, Ma, Xiaoning, Shan, Jiulong
Manufacturing quality audits are pivotal for ensuring high product standards in mass production environments. Traditional auditing processes, however, are labor-intensive and reliant on human expertise, posing challenges in maintaining transparency, accountability, and continuous improvement across complex global supply chains. To address these challenges, we propose a smart audit system empowered by large language models (LLMs). Our approach introduces three innovations: a dynamic risk assessment model that streamlines audit procedures and optimizes resource allocation; a manufacturing compliance copilot that enhances data processing, retrieval, and evaluation for a self-evolving manufacturing knowledge base; and a Re-act framework commonality analysis agent that provides real-time, customized analysis to empower engineers with insights for supplier improvement. These enhancements elevate audit efficiency and effectiveness, with testing scenarios demonstrating an improvement of over 24%.
When Automated Assessment Meets Automated Content Generation: Examining Text Quality in the Era of GPTs
Bevilacqua, Marialena, Oketch, Kezia, Qin, Ruiyang, Stamey, Will, Zhang, Xinyuan, Gan, Yi, Yang, Kai, Abbasi, Ahmed
The use of machine learning (ML) models to assess and score textual data has become increasingly pervasive in an array of contexts including natural language processing, information retrieval, search and recommendation, and credibility assessment of online content. A significant disruption at the intersection of ML and text are text-generating large-language models such as generative pre-trained transformers (GPTs). We empirically assess the differences in how ML-based scoring models trained on human content assess the quality of content generated by humans versus GPTs. To do so, we propose an analysis framework that encompasses essay scoring ML-models, human and ML-generated essays, and a statistical model that parsimoniously considers the impact of type of respondent, prompt genre, and the ML model used for assessment model. A rich testbed is utilized that encompasses 18,460 human-generated and GPT-based essays. Results of our benchmark analysis reveal that transformer pretrained language models (PLMs) more accurately score human essay quality as compared to CNN/RNN and feature-based ML methods. Interestingly, we find that the transformer PLMs tend to score GPT-generated text 10-15\% higher on average, relative to human-authored documents. Conversely, traditional deep learning and feature-based ML models score human text considerably higher. Further analysis reveals that although the transformer PLMs are exclusively fine-tuned on human text, they more prominently attend to certain tokens appearing only in GPT-generated text, possibly due to familiarity/overlap in pre-training. Our framework and results have implications for text classification settings where automated scoring of text is likely to be disrupted by generative AI.
Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model
Zezario, Ryandhimas E., Bai, Bo-Ren Brian, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu
This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scratch and using a direct knowledge transfer mechanism. Second, the benefit of the Huber loss for improving the predictive ability of MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.
Rapid building damage assessment workflow: An implementation for the 2023 Rolling Fork, Mississippi tornado event
Robinson, Caleb, Nsutezo, Simone Fobi, Ortiz, Anthony, Sederholm, Tina, Dodhia, Rahul, Birge, Cameron, Richards, Kasie, Pitcher, Kris, Duarte, Paulo, Ferres, Juan M. Lavista
Rapid and accurate building damage assessments from high-resolution satellite imagery following a natural disaster is essential to inform and optimize first responder efforts. However, performing such building damage assessments in an automated manner is non-trivial due to the challenges posed by variations in disaster-specific damage, diversity in satellite imagery, and the dearth of extensive, labeled datasets. To circumvent these issues, this paper introduces a human-in-the-loop workflow for rapidly training building damage assessment models after a natural disaster. This article details a case study using this workflow, executed in partnership with the American Red Cross during a tornado event in Rolling Fork, Mississippi in March, 2023. The output from our human-in-the-loop modeling process achieved a precision of 0.86 and recall of 0.80 for damaged buildings when compared to ground truth data collected post-disaster. This workflow was implemented end-to-end in under 2 hours per satellite imagery scene, highlighting its potential for real-time deployment.
Transferable Deep Learning Power System Short-Term Voltage Stability Assessment with Physics-Informed Topological Feature Engineering
Feng, Zijian, Chen, Xin, Lv, Zijian, Sun, Peiyuan, Wu, Kai
Deep learning (DL) algorithms have been widely applied to short-term voltage stability (STVS) assessment in power systems. However, transferring the knowledge learned in one power grid to other power grids with topology changes is still a challenging task. This paper proposed a transferable DL-based model for STVS assessment by constructing the topology-aware voltage dynamic features from raw PMU data. Since the reactive power flow and grid topology are essential to voltage stability, the topology-aware and physics-informed voltage dynamic features are utilized to effectively represent the topological and temporal patterns from post-disturbance system dynamic trajectories. The proposed DL-based STVS assessment model is tested under random operating conditions on the New England 39-bus system. It has 99.99\% classification accuracy of the short-term voltage stability status using the topology-aware and physics-informed voltage dynamic features. In addition to high accuracy, the experiments show good adaptability to PMU errors. Moreover, The proposed STVS assessment method has outstanding performance on new grid topologies after fine-tuning. In particular, the highest accuracy reaches 99.68\% in evaluation, which demonstrates a good knowledge transfer ability of the proposed model for power grid topology change.